-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sampling API back to LlamaTokenDataArray; Add DRY and XTC Samplers #594
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I've got a couple of comments that don't require changes, but I would like your thoughts on before I merge.
I've gone ahead and overhauled the sampling API in a manner which resolves your third review comment, but which is very different from what came before. In my latest version, If you think that this API is too far from the raw API or complicates things too much, let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bit of thinking to do around lifetimes before this gets merged.
If you want to remove the samplers that require lifetime logic, it would be nice to get a first iteration of the trivial samplers in, then we can add the more complex ones in a separate PR. There's a lot of great work here I'd hate to see not in main because we're battling encoding model lifetimes guarantees into the sampler.
I don't mind the new API. I weakly preferred the "closer to direct calls" one. |
I think a LlamaSampler struct with a collection of factory functions would be preferable to the params + enum approach. This would also allow different lifetime signatures for different samplers (have most return LlamaSampler<'static> if it owns everything, or LlamaSampler<'a> if it takes a reference to a model for example) |
Okay, I've rewritten the API again, and I think that this is about as simple as it will get while still allowing LlamaSampler to be either a chain or a single sampler. It is also about as close to the raw api as it was before this pr. I will write documentation for the new methods on LlamaSampler later tody. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpicks. This looks good - address / dismiss as you sit fit and once you're happy I will merge long as tests pass.
I think this is also one of the parts of the library that can be reasonably tested (stuff that does inference or loads a model is too hard), a couple tests (preferably in docs) would be lovely.
Thanks for the great PR and patience with the back and forth.
The changes in 0.1.84 removed the ability to use samplers with
LlamaTokenDataArray
s. These changes add the sampling methods of version 0.1.83 back toLlamaTokenDataArray
, adds support for the XTC and DRY samplers, and allows users to apply aLlamaSampler
to aLlamaTokenDataArray
.These changes enable users to mix custom samplers with llama.cpp samplers, and makes the
LlamaTokenDataArray
struct useful again.Let me know if you'd like to see any changes!